Loading Geoscience Australia's Sentinel-1 IW Backscatter from a STAC endpoint (collection 0)¶
This notebook demonstrates key steps for using Python to load preliminary Sentinel-1 IW backscatter products developed by Geoscience Australia.
For Sentinel-1, Geoscience Australia's Digital Earth (DE) branch are currently offering a suite of experimental products that we are calling collection 0, with sample data available over parts of Australia and Antarctica. The product is a collaborative effort from Digital Earth Australia and Digital Earth Antarctica.
If you have any questions, please reach out to the Digital Earth Antarctica team at digitalearthantarctica@ga.gov.au.
Table of Contents¶
- Set-up
- Working with STAC
- Searching and loading
- Transforming and visualising
- Visualising
- Speckle filtering
- Masking
- Converting to decibels
- Exporting
import matplotlib.pyplot as plt
import numpy as np
from odc.stac import load, configure_s3_access
from odc.geo import BoundingBox
from odc.geo.xr import write_cog
import pathlib
from pystac_client import Client
import xarray as xr
Working with STAC¶
STAC is a metadata standard and is commonly coupled with an API that allows users to query that metadata, making it easy to search for Earth observation datasets. DEA provide access to their STAC endpoint via Explorer.
The first step is to connect to DE's development STAC catalog, where the Sentinel-1 collection 0 data is housed.
The pystac_client python library is used to connect to the catalog.
Data is stored in Amazon Web Service's S3 service, and is free to access without an account.
The odc_stac python library is used to configure the required parameters for connecting to S3 without an account.
catalog = "https://explorer.dev.dea.ga.gov.au/stac"
stac_client = Client.open(catalog)
configure_s3_access(
cloud_defaults=True,
aws_unsigned=True,
)
View available collections¶
GA's Sentinel-1 data is published according to the polarisation mode used to capture the data. At this time, we publish data captured in Interferometric Wide (IW) mode. There are three products that we have experimental data for as part of our collection 0:
- VV+VH:
ga_s1_iw_vv_vh_c0 - VV:
ga_s1_iw_vv_c0 - HH:
ga_s1_iw_hh_c0
You can see the distribution of captured data over time and space in the DEA Dev Explorer:
results = stac_client.collection_search(q="ga_s1*_c0")
collections = [result.id for result in results.collection_list()]
print(f"Available collections: {collections}")
Available collections: ['ga_s1_iw_hh_c0', 'ga_s1_iw_vv_c0', 'ga_s1_iw_vv_vh_c0']
Searching and Loading¶
This section provides examples for both Australia and Antarctica, as there are some differences between the two. Differences in polarisation are due to Sentinel-1's observation scenarios. The table below captures the primary differences in the collection 0 offering:
| Property | Australia | Antarctica |
|---|---|---|
| Primary capture mode | IW Vertical Dual-Polarisation (VV+VH) | IW Horizontal Single-Polarisation (HH) |
| Primary product | ga_s1_iw_vv_vh_c0 |
ga_s1_iw_hh_c0 |
| Recommended CRS (metres) | UTM or EPSG:3577 | EPSG:3031 |
| Group by method | Solar day | Scene ID |
In Australia, there are a small number of IW Horizontal and Vertical Single-Polarisation acquisitions over south-east Queensland, but neither are the primary capture mode, and the collection 0 product only contains some data in 2024.
Australia¶
Here, we use a bounding box over Lake Sorell and Lake Crescent in Tasmania, and a date range of June 2020 through mid-July 2020.
aus_bbox = BoundingBox(
left=147.07251,
bottom=-42.22120,
right=147.24274,
top=-42.03035,
crs="EPSG:4326"
)
aus_start_date = "2020-06-01"
aus_end_date = "2020-07-15"
The first step is to search for all observations that match these criteria, referred to as items in STAC.
collections_query = ["ga_s1_iw_vv_vh_c0"]
aus_date_query = f"{aus_start_date}/{aus_end_date}"
aus_bbox_query = aus_bbox.bbox
aus_items = stac_client.search(
collections=collections_query,
datetime=aus_date_query,
bbox=aus_bbox_query
).item_collection()
print(f"Found {len(aus_items)} items")
Found 16 items
When working with GA's Sentinel-1 products, each item corresponds to a single burst, which is a subset of a scene. The items contain the full STAC metadata, so information can be extracted from the properties. For example, below demonstrates the number of unique scenes that the 16 bursts come from:
unique_scenes = set([item.properties["nrb:scene_id"][0] for item in aus_items])
print(f"The identified items come from {len(unique_scenes)} unique scenes")
The identified items come from 1 unique scenes
Once the items have been identified, we use odc-stac to load them.
With the odc-stac load command, you can specify:
crs: the coordinate reference system (CRS) to project loaded data to. If not specified, the data's native CRS will be used.resolution: the resolution to load the data at, in the same units as the chosen CRS. If not specified, the data's native resolution will be used.intersects: a bounding box to clip the loaded data to. If not specified, the whole item will be loaded.bands: the measurements to load from the data (e.g. VV). If not specified, all measurements will be loaded.groupby: how to group loaded data. For Australia, the valuesolar_daywill ensure all bursts captured on the same day are grouped together under one time-stamp (this groups bursts from multiple scenes if captured on the same day). Alternatively, the valuenrb:source_idwill ensure all bursts captured within a scene are grouped together under one time-stamp.chunks={}: request that the data be lazily loaded - an xarray showing the expected dimensions and measurements will be returned. Data will be computed when used. If not specified, data will be loaded into memory.
Note: When selecting a CRS for data over Australia, we recommend "utm" or "EPSG:3577" to get data back in a coordinate system that uses metres. "utm" will return the UTM projection that is most appropriate given the bounding box of the data. If loading data over large portions of Australia, Australian Albers (EPSG:3577) or another CRS may be more appropriate.
# Lazy load our filtered data
aus_ds = load(
aus_items,
crs="utm",
resolution=20,
intersects=aus_bbox.boundary(),
bands=["VV", "VH", "mask"],
groupby="solar_day",
chunks={},
)
aus_ds
<xarray.Dataset> Size: 45MB
Dimensions: (y: 1062, x: 706, time: 5)
Coordinates:
* y (y) float64 8kB 5.347e+06 5.347e+06 ... 5.326e+06 5.326e+06
* x (x) float64 6kB 5.06e+05 5.06e+05 ... 5.201e+05 5.201e+05
spatial_ref int32 4B 32755
* time (time) datetime64[ns] 40B 2020-06-02T09:09:17.900221 ... 202...
Data variables:
VV (time, y, x) float32 15MB dask.array<chunksize=(1, 1062, 706), meta=np.ndarray>
VH (time, y, x) float32 15MB dask.array<chunksize=(1, 1062, 706), meta=np.ndarray>
mask (time, y, x) float32 15MB dask.array<chunksize=(1, 1062, 706), meta=np.ndarray>Antarctica¶
Here, we use a bounding box over Canada Glacier in eastern Antarctica, with a date range of June 2018 through July 2018.
ant_bbox = BoundingBox(
left=162.8555,
bottom=-77.6376,
right=163.0801,
top=-77.5813,
crs="EPSG:4326"
)
ant_start_date = "2018-06-01"
ant_end_date = "2018-07-31"
The first step is to search for all observations that match these criteria, referred to as items in STAC.
collections_query = ["ga_s1_iw_hh_c0"]
ant_date_query = f"{ant_start_date}/{ant_end_date}"
ant_bbox_query = ant_bbox.bbox
ant_items = stac_client.search(
collections=collections_query,
datetime=ant_date_query,
bbox=ant_bbox_query
).item_collection()
print(f"Found {len(ant_items)} items")
Found 10 items
When working with GA's Sentinel-1 products, each item corresponds to a single burst, which is a subset of a scene. The items contain the full STAC metadata, so information can be extracted from the properties. For example, below demonstrates the number of unique scenes that the 10 bursts come from:
unique_scenes = set([item.properties["nrb:scene_id"][0] for item in ant_items])
print(f"The identified items come from {len(unique_scenes)} unique scenes")
The identified items come from 1 unique scenes
Once the items have been identified, we use odc-stac to load them.
With the odc-stac load command, you can specify:
crs: the coordinate reference system (CRS) to project loaded data to. If not specified, the data's native CRS will be used.resolution: the resolution to load the data at, in the same units as the chosen CRS. If not specified, the data's native resolution will be used.intersects: a bounding box to clip the loaded data to. If not specified, the whole item will be loaded.bands: the measurements to load from the data (e.g. HH). If not specified, all measurements will be loaded.groupby: how to group loaded data. For Antarctica, the valuenrb:source_idwill ensure all bursts captured within a scene are grouped together under one time-stamp.chunks={}: request that the data be lazily loaded - an xarray showing the expected dimensions and measurements will be returned. Data will be computed when used. If not specified, data will be loaded into memory.
Note: When selecting a CRS for data over Antarctica, we recommend "EPSG:3031", which matches the data's native projection.
# Lazy load our filtered data
ant_ds = load(
ant_items,
crs="EPSG:3031",
resolution=20,
intersects=ant_bbox.boundary(),
bands=["HH", "mask"],
groupby="nrb:scene_id",
chunks={},
)
ant_ds
<xarray.Dataset> Size: 5MB
Dimensions: (y: 374, x: 344, time: 5)
Coordinates:
* y (y) float64 3kB -1.288e+06 -1.288e+06 ... -1.296e+06 -1.296e+06
* x (x) float64 3kB 3.924e+05 3.924e+05 ... 3.992e+05 3.992e+05
spatial_ref int32 4B 3031
* time (time) datetime64[ns] 40B 2018-06-03T12:47:50.749496 ... 201...
Data variables:
HH (time, y, x) float32 3MB dask.array<chunksize=(1, 374, 344), meta=np.ndarray>
mask (time, y, x) float32 3MB dask.array<chunksize=(1, 374, 344), meta=np.ndarray>Loading data in memory¶
Once you have decided you are happy with the area of interest, crs, resolution, bands, etc., you can load data into memory using xarray's .compute operation.
This is valuable once you are ready to apply transformations to the data, or wish to visualise the data, as it will save needing to read the data into memory every time.
aus_ds = aus_ds.compute()
ant_ds = ant_ds.compute()
Transforming and visualising loaded data¶
The GA Sentinel-1 backscatter products are provided as linear gamma0 values. It is common to apply some transformations, such as masking, speckle filtering, and converting from linear scale to decibels. For collection 0, converting to other normalisation conventions (e.g. sigma0 and beta0) is not currently available.
If you need either sigma0 or beta0 for your work, please contact digitalearthantarctica@ga.gov.au with your request and information about your application to provide context about why you need an alternative normalisation convention. This helps us understand demand for these products, which may influence if we deliver them in future.
Visualising¶
To see the full timeseries for a particular band, the following plotting command can be used:
Australia¶
The dark areas are the two lakes.
aus_ds.VV.plot.imshow(col="time", col_wrap=3, robust=True, cmap="Greys_r")
<xarray.plot.facetgrid.FacetGrid at 0x16b1096a0>
Antarctica¶
The dark area corresponds to the glacier. The bright segments are likely due to layover as there is significant terrain in this region.
ant_ds.HH.plot.imshow(col="time", col_wrap=3, robust=True, cmap="Greys_r")
<xarray.plot.facetgrid.FacetGrid at 0x30571dd10>
Speckle filtering¶
Speckle filtering aims to reduce noise present in SAR images. One filter that is commonly applied is the Lee filter. We have supplied a python file that contains the Lee filter definition, as well as a function to apply it to xarrays.
In the two examples below, we use a window value of 5 pixels. Due to the Antarctic example being a smaller area, it appears more blurred than the Australian example.
from speckle_filters import apply_lee_filter
We also define a plotting function that allows us to compare the filtered and unfiltered bands:
def compare_bands_plot(ds, time_index, band1, band2, cmap="Greys_r"):
ds_timestep = ds.isel(time=time_index)
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
ds_timestep[band1].plot(ax=axes[0], cmap=cmap, robust=True)
ds_timestep[band2].plot(ax=axes[1], cmap=cmap, robust=True)
plt.tight_layout()
plt.show()
Australia¶
In this example, we apply the Lee filter using a uniform filter window size of 5 pixels.
aus_ds["VV_filtered"] = apply_lee_filter(aus_ds.VV, size=5)
compare_bands_plot(ds=aus_ds, time_index=0, band1="VV", band2="VV_filtered")
Antarctica¶
In this example, we apply the Lee filter using a uniform filter window size of 5 pixels.
ant_ds["HH_filtered"] = apply_lee_filter(ant_ds.HH, size=5)
compare_bands_plot(ds=ant_ds, time_index=0, band1="HH", band2="HH_filtered")
Masking¶
The GA Sentinel-1 backscatter product comes with a mask that indicates invalid pixels, along with pixels impacted by layover and shadow.
The masks have the following values:
| Value | Property |
|---|---|
| 0 | Valid |
| 1 | Shadow |
| 2 | Layover |
| 3 | Shadow and layover |
| 255 / NaN | Invalid |
The following code displays the masks and shows how to apply them.
Australia¶
Viewing the masks for each observation, there is very little layover or shadow in this area. This is consistent with there not being much terrain in the chosen area.
aus_ds.mask.plot.imshow(
col="time",
col_wrap=3,
levels=[-0.5, 0.5, 1.5, 2.5, 3.5],
cbar_kwargs={'ticks': [0, 1, 2, 3]}
)
<xarray.plot.facetgrid.FacetGrid at 0x305a79590>
Applying the mask¶
To apply the mask, we use xarray's where function, which takes the condition, followed by the values to keep if the condition is true, followed by the values to use if the condition is false.
The following code creates a new band, VV_filtered_masked, that keeps the original VV_filtered values where the mask is equal to 0, and replaces values with NaN otherwise.
For plotting, this time we use a non-grey colour bar to make the effect of the masking more obvious (although there is little masking in this scene).
aus_ds["VV_filtered_masked"] = xr.where(aus_ds.mask==0, aus_ds.VV_filtered, np.nan)
aus_ds.VV_filtered_masked.isel(time=0).plot.imshow(robust=True)
<matplotlib.image.AxesImage at 0x307c379d0>
Antarctica¶
There is significant terrain in the region that may introducing bright and dark artifacts from layover and shadow. Viewing the mask shows there are multiple areas flagged as affected by layover, and it is important to replace these values with no-data (NaN) before continuing with any analysis.
ant_ds.mask.plot.imshow(
col="time",
col_wrap=3,
levels=[-0.5, 0.5, 1.5, 2.5, 3.5],
cbar_kwargs={'ticks': [0, 1, 2, 3]}
)
<xarray.plot.facetgrid.FacetGrid at 0x116c50b00>
Applying the mask¶
To apply the mask, we use xarray's where function, which takes the condition, followed by the values to keep if the condition is true, followed by the values to use if the condition is false.
The following code creates a new band, HH_filtered_masked, that keeps the original HH_filtered values where the mask is equal to 0, and replaces values with NaN otherwise.
Each observation has its own mask, which will be applied when using this approach.
For plotting, this time we use a non-grey colour bar to make the effect of the masking more obvious.
ant_ds["HH_filtered_masked"] = xr.where(ant_ds.mask==0, ant_ds.HH_filtered, np.nan)
ant_ds.HH_filtered_masked.isel(time=0).plot.imshow(robust=True)
<matplotlib.image.AxesImage at 0x305c09090>
Converting to decibels¶
GA's Sentinel-1 backscatter product is provided as linear gamma0. For some analyses, it may be beneficial to convert the linear backscatter to decibels (dB).
The conversion equation is $$\text{backscatter}_{\text{dB}} = 10 \times \log_{10}(\text{backscatter}_\text{linear})$$
Australia¶
aus_ds["VV_filtered_masked_db"] = 10*np.log10(aus_ds.VV_filtered_masked)
aus_ds.VV_filtered_masked_db.plot.imshow(col="time", col_wrap=3, cmap="Greys_r")
<xarray.plot.facetgrid.FacetGrid at 0x16b1dfbb0>
Antarctica¶
ant_ds["HH_filtered_masked_db"] = 10*np.log10(ant_ds.HH_filtered_masked)
ant_ds.HH_filtered_masked_db.plot.imshow(col="time", col_wrap=3, cmap="Greys_r")
<xarray.plot.facetgrid.FacetGrid at 0x146857890>
Exporting¶
Once you have generated the data you need, you may wish to export it to enable further analysis.
The odc_geo python library provides a useful function for exporting cloud optimised GEOtiffs.
First, we'll create a folder to store the outputs in.
output_path = pathlib.Path("outputs")
output_path.mkdir(exist_ok=True)
Single time step¶
The write_cog function is designed to export a single time step.
The following code shows how to isolate the first time step and export it.
# Australia
write_cog(
aus_ds.VV_filtered_masked_db.isel(time=0),
output_path / "STAC_Australia_VV_filtered_masked_db_single_timestep.cog",
overwrite=True
)
# Antarctica
write_cog(
ant_ds.HH_filtered_masked_db.isel(time=0),
output_path / "STAC_Antarctica_HH_filtered_masked_db_single_timestep.cog",
overwrite=True
)
PosixPath('outputs/STAC_Antarctica_HH_filtered_masked_db_single_timestep.cog')
Multiple time steps¶
It is possible to wrap the above function in a loop to allow you to export all time steps in an xarray. We also include code to extract the datetime of each image, which can then be used to label the output file.
# Australia
aus_dt_strings = aus_ds.time.dt.strftime("%Y-%m-%d").values
for timestep in range(len(aus_ds.time)):
filename = output_path / f"STAC_Australia_VV_filtered_masked_db_t{timestep}_{aus_dt_strings[timestep]}.cog"
array = aus_ds.VV_filtered_masked_db.isel(time=timestep)
write_cog(array, filename, overwrite=True)
# Antarctica
ant_dt_strings = ant_ds.time.dt.strftime("%Y-%m-%d").values
for timestep in range(len(ant_ds.time)):
filename = output_path / f"STAC_Antarctica_HH_filtered_masked_db_t{timestep}_{ant_dt_strings[timestep]}.cog"
array = ant_ds.HH_filtered_masked_db.isel(time=timestep)
write_cog(array, filename, overwrite=True)